Learning to Find Context Based Spelling Errors

نویسندگان

  • Hisham Al-Mubaid
  • Klaus Truemper
چکیده

A context-based spelling error is a spelling or typing error that turns an intended word into another word of the language. For example, the intended word “sight” might become the word “site.” A spell checker cannot identify such an error. In the English language— the case of interest here—a syntax checker may also fail to catch such an error since, among other reasons, the parts-of-speech of an erroneous word may permit an acceptable parsing. This chapter presents an effective method called Ltest for identifying the majority of context-based spelling errors. Ltest learns from prior, correct text how context-based spelling errors may manifest themselves, by purposely introducing such errors and analyzing the resulting text using a data mining algorithm. The output of this learning step consists of a collection of logic formulas that in some sense represent knowledge about possible context-based spelling errors. When, subsequently, testing text is examined for context-based spelling errors, the logic formulas and a portion of the prior text are used to analyze the case at hand and to pinpoint likely errors. Ltest has been added to an existing software system for spell and syntax checking. We have conducted tests involving mathematical, technical, and general texts. On the average, Ltest found 68% of context-based spelling errors in large texts and 87% of such errors in small texts. These detection rates are relative to words for which training was possible using the prior text. On the other hand, the number of false-positive diagnoses was small, involving on average 23 word instances (= 0.7% of the possible error instances) of a large text and 1 word instance (= 8% of the possible error instances) of a small text. These statistics indicate that the method is effective for the recognition of the majority of context-based spelling errors considered in the experimental tests.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

ارائه یک رتبه‌بند برای خطایاب معنایی با استفاده از ویژگی‌های حساس به متن

Nowadays, a large volume of documents is generated daily. These documents generated by different persons, thus, the documents contain spelling errors. These spelling errors cause quality of the documents are decrease. Therefore, existence of automatic writing assistance tools such as spell checker/corrector can help to improve their quality. Context-sensitive are misspelled words that have been...

متن کامل

Spelling-based Phonics Instruction: It’s Effect on English Reading and Spelling in an EFL Context

Systematic phonics instruction in first language education has recently received considerable research attention due to its critical role in facilitating phonological awareness and processing skills. However, little is known about the effects of systematic phonics instruction on foreign language reading and spelling in an EFL context. This study examined the effects of spelling-based phonics in...

متن کامل

Applying Winnow to Context-sensitive Spelling Correction Applying Winnow to Context-sensitive Spelling Correction

Multiplicative weight-updating algorithms such as Winnow have been studied extensively in the COLT literature, but only recently have people started to use them in applications. In this paper, we apply a Winnow-based algorithm to a task in natural language: context-sensitive spelling correction. This is the task of xing spelling errors that happen to result in valid words, such as substituting ...

متن کامل

Adaptating the Levenshtein Distance to Contextual Spelling Correction

In the last few years, computing environments for human learning have rapidly evolved due to the development of information and communication technologies. However, the use of information technology in automatic correction of spelling errors has become increasingly essential. In this context, we have developed a system for correcting spelling errors in the Arabic language based on language mode...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001